NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Information-based Optimal Subdata Selection for Clusterwise Linear Regression

https://doi.org/10.5705/ss.202023.0302

Liu, Yanxi; Stufken, John; Yang, Min (January 2026, Statistica Sinica)

Free, publicly-accessible full text available January 1, 2027
Bootstrap aggregated designs for generalized linear models

https://doi.org/10.52933/jdssv.v5i1.123

Rios, Nicholas; Stufken, John (January 2025, Journal of Data Science, Statistics, and Visualisation)

Many experiments require modeling a non-Normal response. In particular, count responses and binary responses are quite common. The relationship between predictors and the responses are typically modeled via a Generalized Linear Model (GLM). Finding D-optimal designs for GLMs, which reduce the generalized variance of the model coefficients, is desired. A common approach to finding optimal designs for GLMs is to use a local design, but local designs are vulnerableto parameter misspecification. The focus of this paper is to provide designs for GLMs that are robust to parameter misspecification. This is done by applying a bagging procedure to pilot data, where the results of many locally optimal designsare aggregated to produce an approximate design that reflects the uncertainty in the model coefficients. Results show that the proposed bagging procedure is robust to changes in the underlying model parameters. Furthermore, the proposed designs are shown to be preferable to traditional methods, which may be over-conservative.
more » « less
Full Text Available
Orthogonal Arrays: A Review

https://doi.org/10.1002/wics.70029

Lin, C_Devon; Stufken, John (May 2025, WIREs Computational Statistics)

ABSTRACT Orthogonal arrays are arguably one of the most fascinating and important statistical tools for efficient data collection. They have a simple, natural definition, desirable properties when used as fractional factorials, and a rich and beautiful mathematical theory. Their connections with combinatorics, finite fields, geometry, and error‐correcting codes are profound. Orthogonal arrays have been widely used in agriculture, engineering, manufacturing, and high‐technology industries for quality and productivity improvement experiments. In recent years, they have drawn rapidly growing interest from various fields such as computer experiments, integration, visualization, optimization, big data, machine learning/artificial intelligence through successful applications in those fields. We review the fundamental concepts and statistical properties and report recent developments. Discussions of recent applications and connections with various fields are presented. Some interesting open research directions are also presented.
more » « less
Factor selection in screening experiments by aggregation over random models

https://doi.org/10.1016/j.csda.2024.107940

Singh, Rakhi; Stufken, John (June 2024, Computational Statistics & Data Analysis)

Full Text Available
Optimal designs for generalized linear mixed models based on the penalized quasi-likelihood method

https://doi.org/10.1007/s11222-023-10279-3

Shi, Yao; Yu, Wanchunzi; Stufken, John (October 2023, Statistics and Computing)

While generalized linear mixed models are useful, optimal design questions for such models are challenging due to complexity of the information matrices. For longitudinal data, after comparing three approximations for the information matrices, we propose an approximation based on the penalized quasi-likelihood method.We evaluate this approximation for logistic mixed models with time as the single predictor variable. Assuming that the experimenter controls at which time observations are to be made, the approximation is used to identify locally optimal designs based on the commonly used A- and D-optimality criteria. The method can also be used for models with random block effects. Locally optimal designs found by a Particle Swarm Optimization algorithm are presented and discussed. As an illustration, optimal designs are derived for a study on self-reported disability in olderwomen. Finally,we also study the robustness of the locally optimal designs to mis-specification of the covariance matrix for the random effects.
more » « less
Full Text Available
Scalable level-wise screening experiments using locating arrays

https://doi.org/10.1080/00224065.2023.2220973

Akhtar, Yasmeen; Zhang, Fan; Colbourn, Charles J; Stufken, John; Syrotiuk, Violet R (October 2023, Journal of Quality Technology)

Alternative design and analysis methods for screening experiments based on locating arrays are presented. The number of runs in a locating array grows logarithmically based on the number of factors, providing efficient methods for screening complex engineered systems, especially those with large numbers of categorical factors having different numbers of levels. Our analysis method focuses on levels of factors in the identification of important main effects and two-way interactions. We demonstrate the validity of our design and analysis methods on both well-studied and synthetic data sets and investigate both statistical and combinatorial properties of locating arrays that appear to be related to their screening capability.
more » « less
Full Text Available
Selection of Two-Level Supersaturated Designs for Main Effects Models

https://doi.org/10.1080/00401706.2022.2102080

Singh, Rakhi; Stufken, John (January 2023, Technometrics)

Full Text Available
Subdata Selection With a Large Number of Variables

https://doi.org/10.51387/23-NEJSDS36

Singh, Rakhi; Stufken, John (January 2023, The New England Journal of Statistics in Data Science)

Subdata selection from big data is an active area of research that facilitates inferences based on big data with limited computational expense. For linear regression models, the optimal design-inspired Information-Based Optimal Subdata Selection (IBOSS) method is a computationally efficient method for selecting subdata that has excellent statistical properties. But the method can only be used if the subdata size, k, is at last twice the number of regression variables, p. In addition, even when $$k\ge 2p$$, under the assumption of effect sparsity, one can expect to obtain subdata with better statistical properties by trying to focus on active variables. Inspired by recent efforts to extend the IBOSS method to situations with a large number of variables p, we introduce a method called Combining Lasso And Subdata Selection (CLASS) that, as shown, improves on other proposed methods in terms of variable selection and building a predictive model based on subdata when the full data size n is very large and the number of variables p is large. In terms of computational expense, CLASS is more expensive than recent competitors for moderately large values of n, but the roles reverse under effect sparsity for extremely large values of n.
more » « less
Full Text Available
Locally D-Optimal Designs for Binary Responses and Multiple Continuous Design Variables

https://doi.org/10.1007/s40953-022-00304-z

Wang, Zhongshen; Stufken, John (September 2022, Journal of Quantitative Economics)

Full Text Available
Orthogonal Array Based Locally D-Optimal Designs for Binary Responses in the Presence of Factorial Effects

https://doi.org/10.1007/s42519-021-00224-w

Wang, Zhongshen; Stufken, John (December 2021, Journal of Statistical Theory and Practice)

Full Text Available

« Prev Next »

Search for: All records